Enriching a Text by Semantic Disambiguation for Information Extraction

نویسندگان

  • Bernard Jacquemin
  • Caroline Brun
  • Claude Roux
چکیده

External linguistic resources have been used for a very long time in information extraction. These methods enrich a document with data that are semantically equivalent, in order to improve recall. For instance, some of these methods use synonym dictionaries. These dictionaries enrich a sentence with words that have a similar meaning. However, these methods present some serious drawbacks, since words are usually synonyms only in restricted contexts. The method we propose here consists of using word sense disambiguation rules (WSD) to restrict the selection of synonyms to only these that match a specific syntactico-semantic context. We show how WSD rules are built and how information extraction techniques can benefit from the application of these rules.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Named Entity Disambiguation: a Hybrid Approach

Semantic annotation of named entities for enriching unstructured content is a critical step in development of Semantic Web and many Natural Language Processing applications. To this end, this paper addresses the named entity disambiguation problem that aims at detecting entity mentions in a text and then linking them to entries in a knowledge base. In this paper, we propose a hybrid method, com...

متن کامل

Presenting a method for extracting structured domain-dependent information from Farsi Web pages

Extracting structured information about entities from web texts is an important task in web mining, natural language processing, and information extraction. Information extraction is useful in many applications including search engines, question-answering systems, recommender systems, machine translation, etc. An information extraction system aims to identify the entities from the text and extr...

متن کامل

Enriching the WordNet Taxonomy with Contextual Knowledge Acquired from Text

This paper presents a possible solution for the problem of integrating contextual knowledge in the WordNet database. Contextual structures are derived from three sources: (1) minimal contexts-in the form of semantic nets transformations of WordNet glosses; (2) dynamic contexts rendered by webs of lexico-semantic paths revealing textual implied information and (3) static contexts-represented by ...

متن کامل

Information extraction from non-segmented text (on the material of weather forecast telegrams)

Both the domain and sublanguage specific approach to text analysis and information extraction is proposed. Texts under consideration are weather forecast telegrams written in Russian. Telegrams are an example of deviant text type, with lack of text segmentation means, a lot of abbreviations, syntactic and spelling mistakes. The presented work pursues the problem of text segmentation: a procedur...

متن کامل

A Framework for Enriching Lexical Semantic Resources with Distributional Semantics

We present an approach to combining distributional semantic representations induced from text corpora with manually constructed lexical-semantic networks. While both kinds of semantic resources are available with high lexical coverage, our aligned resource combines the domain specificity and availability of contextual information from distributional models with the conciseness and high quality ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره abs/cs/0506048  شماره 

صفحات  -

تاریخ انتشار 2002